Several psychometric scaling methods start from proximity data and yield structures revealing the underlying organization of the data. Data clustering and multidimensional scaling are two such methods. Network scaling represents another method based on graph theory. Pathfinder networks are derived from proximities for pairs of entities. Proximities can be obtained from similarities, correlations, distances, conditional probabilities, or any other measure of the relationships among entities. The entities are often concepts of some sort, but they can be anything with a pattern of relationships. In the Pathfinder network, the entities correspond to the nodes of the generated network, and the links in the network are determined by the patterns of proximities. For example, if the proximities are similarities, links will generally connect nodes of high similarity. The links in the network will be undirected if the proximities are symmetrical for every pair of entities. Symmetrical proximities mean that the order of the entities is not important, so the proximity of i and j is the same as the proximity of j and i for all pairs i,j. If the proximities are not symmetrical for every pair, the links will be directed.
Here is an example of an undirected Pathfinder network derived from average similarity ratings of a group of biology graduate students. The students rated the similarity of all pairs of the terms shown.
Pathfinder uses two parameters. (1) The q parameter constrains the number of indirect proximities examined in generating the network. The q parameter is an integer value between 2 and n − 1, inclusive where n is the number of nodes or items. (2) The r parameter defines the metric used for computing the distance of paths (cf. the Minkowski distance). The r parameter is a real number between 1 and infinity, inclusive. A network generated with particular values of q and r is called a PFnet(q, r). Both of the parameters have the effect of decreasing the number of links in the network as their values are increased. The network with the minimum number of links is obtained when q = n − 1 and r = ∞, i.e., PFnet(n − 1, ∞).
With ordinal-scale data (see level of measurement), the r-parameter should be infinity because the same PFnet would result from any positive monotonic transformation of the proximity data. Other values of r require data measured on a ratio scale. The q parameter can be varied to yield the desired number of links in the network.
Essentially, Pathfinder networks preserve the shortest possible paths given the data so links are eliminated when they are not on shortest paths. The PFnet(n − 1, ∞) will be the minimum spanning tree for the links defined by the proximity data if a unique minimum spanning tree exists. In general, the PFnet(n − 1, ∞) includes all of the links in any minimum spanning tree.
Pathfinder networks are used in the study of expertise, knowledge acquisition, knowledge engineering, citation patterns, information retrieval, and data visualization. The networks are potentially applicable to any problem addressed by network theory.
Further information on Pathfinder networks and several examples of the application of PFnets to a variety of problems can be found in:
A shorter article summarizing Pathfinder networks:
Three papers describing fast implementations of Pathfinder networks:
(Quirin et al. is significantly faster, but can only be applied in cases where q = n − 1, while Guerrero-Bote et al. can be use for all cases.)